cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...

, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building

collision-resistant In cryptography, collision resistance is a property of cryptographic hash functions: a hash function ''H'' is collision-resistant if it is hard to find two inputs that hash to the same output; that is, two inputs ''a'' and ''b'' where ''a'' ≠ '' ...

cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...

s from collision-resistant

one-way compression function In cryptography, a one-way compression function is a function that transforms two fixed-length inputs into a fixed-length output. The transformation is "one-way", meaning that it is difficult given a particular output to compute inputs which compre ...

s. Goldwasser, S. and Bellare, M.br>"Lecture Notes on Cryptography"
Summer course on cryptography, MIT, 1996-2001 This construction was used in the design of many popular hash algorithms such as MD5,

SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecima ...

and

SHA-2 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...

. The Merkle–Damgård construction was described in Ralph Merkle's

Ph.D. A Doctor of Philosophy (PhD, Ph.D., or DPhil; Latin: or ') is the most common degree at the highest academic level awarded following a course of study. PhDs are awarded for programs across the whole breadth of academic fields. Because it is ...

thesis A thesis ( : theses), or dissertation (abbreviated diss.), is a document submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings.International Standard ISO 7144: ...

in 1979.

Ralph Merkle Ralph C. Merkle (born February 2, 1952) is a computer scientist and mathematician. He is one of the inventors of public-key cryptography, the inventor of cryptographic hashing, and more recently a researcher and speaker on cryonics. Contribution ...

and

Ivan Damgård Ivan Bjerre Damgård (born 1956) is a Danish cryptographer and currently a professor at the Department of Computer Science, Aarhus University, Denmark. Academic background In 1983, he obtained a master's degree in mathematics (with minors in ...

independently proved that the structure is sound: that is, if an appropriate padding scheme is used and the compression function is

, then the hash function will also be collision-resistant. The Merkle–Damgård hash function first applies an MD-compliant padding function to create an input whose size is a multiple of a fixed number (e.g. 512 or 1024) — this is because compression functions cannot handle inputs of arbitrary size. The hash function then breaks the result into blocks of fixed size, and processes them one at a time with the compression function, each time combining a block of the input with the output of the previous round. In order to make the construction secure, Merkle and Damgård proposed that messages be padded with a padding that encodes the length of the original message. This is called ''length padding'' or Merkle–Damgård strengthening. In the diagram, the one-way compression function is denoted by ''f'', and transforms two fixed length inputs to an output of the same size as one of the inputs. The algorithm starts with an initial value, the

initialization vector In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to ...

(IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function ''f'' takes the result so far, combines it with the message block, and produces an intermediate result. The last block is padded with zeros as needed and bits representing the length of the entire message are appended. (See below for a detailed length padding example.) To harden the hash further, the last result is then sometimes fed through a ''finalisation function''. The finalisation function can have several purposes such as compressing a bigger internal state (the last result) into a smaller output hash size or to guarantee a better mixing and

avalanche effect In cryptography, the avalanche effect is the desirable property of cryptographic algorithms, typically block ciphers and cryptographic hash functions, wherein if an input is changed slightly (for example, flipping a single bit), the output changes ...

on the bits in the hash sum. The finalisation function is often built by using the compression function. (Note that in some documents a different terminology is used: the act of length padding is called "finalisation".)

Security characteristics

The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the one-way compression function ''f'' is collision resistant, then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties: * Second

preimage attack In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs). In the context of attack, th ...

s against long messages are always much more efficient than brute force. * Multicollisions (many messages with the same hash) can be found with only a little more work than collisions. * "Herding attacks", which combines the cascaded construction for multicollision finding (similar to the above) with collisions found for a given prefix (chosen-prefix collisions). This allows for constructing highly specific colliding documents, and it can be done for more work than finding a collision, but much less than would be expected to do this for a

random oracle In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every ''unique query'' with a (truly) random response chosen uniformly from its output domain. If a query is repeated, it responds the same way every time th ...

. * Length extension: Given the hash of an unknown input ''X'', it is easy to find the value of

H(\mathsf(X) \,  Y)

, where ''pad'' is the padding function of the hash. That is, it is possible to find hashes of inputs related to ''X'' even though ''X'' remains unknown. Length extension attacks were actually used to attack a number of commercial web message authentication schemes such as one used by

Flickr Flickr ( ; ) is an American image hosting and video hosting service, as well as an online community, founded in Canada and headquartered in the United States. It was created by Ludicorp in 2004 and was a popular way for amateur and professional ...

Wide pipe construction

Due to several structural weaknesses of Merkle–Damgård construction, especially the length extension problem and multicollision attacks,

Stefan Lucks Stefan Lucks is a researcher in the fields of communications security and cryptography. Lucks is known for his attack on Triple DES, and for extending Lars Knudsen's Square attack to Twofish, a cipher outside the Square family, thus generalisi ...

proposed the use of the wide-pipe hash instead of Merkle–Damgård construction. The wide-pipe hash is very similar to the Merkle–Damgård construction but has a larger internal state size, meaning that the bit-length that is internally used is larger than the output bit-length. If a hash of ''n'' bits is desired, the compression function ''f'' takes ''2n'' bits of chaining value and ''m'' bits of the message and compresses this to an output of ''2n'' bits. Therefore, in a final step a second compression function compresses the last internal hash value (''2n'' bit) to the final hash value (''n'' bit). This can be done as simply as discarding half of the last ''2n''-bit-output. SHA-512/224 and SHA-512/256 take this form since they are derived from a variant of SHA-512. SHA-384 and SHA-224 are similarly derived from SHA-512 and SHA-256, respectively, but the width of their pipe is much less than ''2n''.

Fast wide pipe construction

It has been demonstrated by Mridul Nandi and

Souradyuti Paul Souradyuti Paul (born 1976) is an Indian cryptologist. Formerly a member of COSIC, he is currently working as an associate professor at Indian Institute of Technology Bhilai and a Guest Researcher for the National Institute of Standards and Techno ...

that the Widepipe hash function can be made approximately twice as fast if the widepipe state can be divided in half in the following manner: one half is input to the succeeding compression function while the other half is combined with the output of that compression function.Mridul Nandi and Souradyuti Paul. Speeding Up the Widepipe: Secure and Fast Hashing. In Guang Gong and Kishan Gupta, editor, Indocrypt 2010, Springer, 2010. The main idea of the hash construction is to forward half of the previous chaining value forward to XOR it to the output of the compression function. In so doing the construction takes in longer message blocks every iteration than the original widepipe. Using the same function ''f'' as before, it takes ''n'' bit chaining values and ''n+m'' bits of the message. However, the price to pay is the extra memory used in the construction for feed-forward.

MD-compliant padding

As mentioned in the introduction, the padding scheme used in the Merkle–Damgård construction must be chosen carefully to ensure the security of the scheme.

Mihir Bellare Mihir Bellare is a cryptographer and professor at the University of California San Diego. He has published several seminal papers in the field of cryptography (notably in the area of provable security), many of which were co-written with Phillip R ...

gives sufficient conditions for a padding scheme to possess to ensure that the MD construction is secure: it suffices that the scheme be "MD-compliant" (the original length-padding scheme used by Merkle is an example of MD-compliant padding). Conditions: *

M

is a prefix of

\mathsf(M).

*If

, M_,  = , M_,

then

, \mathsf(M_),  = , \mathsf(M_), .

*If

, M_,  \neq , M_,

then the last block of

\mathsf(M_)

is different from the last block of

\mathsf(M_).

Where

, X,

denotes the length of

X

. With these conditions in place, we find a collision in the MD hash function ''exactly when'' we find a collision in the underlying compression function. Therefore, the Merkle–Damgård construction is provably secure when the underlying compression function is secure.

Length padding example

To be able to feed the message to the compression function, the last block needs to be padded with constant data (generally with zeroes) to a full block. For example, suppose the message to be hashed is "HashInput" (9 octet string, 0x48617368496e707574 in

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...

) and the block size of the compression function is 8 bytes (64 bits). We get two blocks (the padding octets shown with lightblue background color): : This implies that other messages having the same content but ending with additional zeros at the end will result in the same hash value. In the above example, another almost identical message (0x48617368496e7075 74) will generate the same hash value as the original message "HashInput" above. In other words, any message having extra zeros at the end makes it indistinguishable with the one without them. To prevent this situation, the first bit of the first padding octet is changed to "1" (0x80), yielding: : To make it resistant against the

length extension attack In cryptography and computer security, a length extension attack is a type of attack where an attacker can use Hash(''message1'') and the length of ''message1'' to calculate Hash(''message1'' ‖ ''message2'') for an attacker-controlled ''message2 ...

, the message length is added in an extra block at the end (shown with yellow background color): : However, most common implementations use a fixed bit-size (generally 64 or 128 bits in modern algorithms) at a fixed position at the end of the last block for inserting the message length value (see '' SHA-1 pseudocode''). Further improvement can be made by inserting the length value in the last block if there is enough space. Doing so avoids having an extra block for the message length. If we assume the length value is encoded on 5 bytes (40 bits), the message becomes: : Note that storing the message length out-of-band in metadata, or otherwise embedded at the start of the message is an effective mitigation of the length extension attack, as long as invalidation of either the message length and checksum are both considered failure of integrity checking.

References

*
Handbook of Applied Cryptography
' by Menezes, van Oorschot and Vanstone (2001), chapter 9. *

', by Jonathan Katz and Yehuda Lindell. Chapman and Hall/CRC Press, August 2007, page 134 (construction 4.13). *
Cryptography Made Simple
' by Nigel Smart (2015), chapter 14. {{DEFAULTSORT:Merkle-Damgard Construction Cryptographic hash functions